skip to main content


Search for: All records

Creators/Authors contains: "Shiu, Shin-Han"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available October 31, 2024
  2. The Oomycete plant pathogen,Phytophthora capsici, causes root, crown, and fruit rot of winter squash (Cucurbita moschata) and limits production. SomeC. moschatacultivars develop age-related resistance (ARR), whereby fruit develop resistance toP. capsici14 to 21 days postpollination (DPP) because of thickened exocarp; however, wounding negates ARR. We uncovered the genetic mechanisms of ARR of twoC. moschatacultivars, Chieftain and Dickenson Field, that exhibit ARR at 14 and 21 DPP, respectively, using RNA sequencing. The sequencing was conducted using RNA samples from ‘Chieftain’ and ‘Dickenson Field’ fruit at 7, 10, 14, and 21 DPP. A differential expression and subsequent gene set enrichment analysis revealed an overrepresentation of upregulated genes in functional categories relevant to cell wall structure biosynthesis, cell wall modification/organization, transcription regulation, and metabolic processes. A pathway enrichment analysis detected upregulated genes in cutin, suberin monomer, and phenylpropanoid biosynthetic pathways. A further analysis of the expression profile of genes in those pathways revealed upregulation of genes in monolignol biosynthesis and lignin polymerization in the resistant fruit peel. Our findings suggest a shift in gene expression toward the physical strengthening of the cell wall associated with ARR toP. capsici. These findings provide candidate genes for developingCucurbitacultivars with resistance toP. capsiciand improve fruit rot management inCucurbitaspecies.

     
    more » « less
    Free, publicly-accessible full text available October 23, 2024
  3. Basic helix–loop–helix (bHLH) proteins are one of the largest families of transcription factor (TF) in eukaryotes, and ~30% of all flowering plants’ bHLH TFs contain the aspartate kinase, chorismate mutase, and TyrA (ACT)-like domain at variable distances C-terminal from the bHLH. However, the evolutionary history and functional consequences of the bHLH/ACT-like domain association remain unknown. Here, we show that this domain association is unique to the plantae kingdom with green algae (chlorophytes) harboring a small number of bHLH genes with variable frequency of ACT-like domain’s presence. bHLH-associated ACT-like domains form a monophyletic group, indicating a common origin. Indeed, phylogenetic analysis results suggest that the association of ACT-like and bHLH domains occurred early in Plantae by recruitment of an ACT-like domain in a common ancestor with widely distributed ACT DOMAIN REPEAT ( ACR ) genes by an ancestral bHLH gene. We determined the functional significance of this association by showing that Chlamydomonas reinhardtii ACT-like domains mediate homodimer formation and negatively affect DNA binding of the associated bHLH domains. We show that, while ACT-like domains have experienced faster selection than the associated bHLH domain, their rates of evolution are strongly and positively correlated, suggesting that the evolution of the ACT-like domains was constrained by the bHLH domains. This study proposes an evolutionary trajectory for the association of ACT-like and bHLH domains with the experimental characterization of the functional consequence in the regulation of plant-specific processes, highlighting the impacts of functional domain coevolution. 
    more » « less
    Free, publicly-accessible full text available May 9, 2024
  4. Abstract

    Natural language processing (NLP) techniques can enhance our ability to interpret plant science literature. Many state-of-the-art algorithms for NLP tasks require high-quality labelled data in the target domain, in which entities like genes and proteins, as well as the relationships between entities, are labelled according to a set of annotation guidelines. While there exist such datasets for other domains, these resources need development in the plant sciences. Here, we present the Plant ScIenCe KnowLedgE Graph (PICKLE) corpus, a collection of 250 plant science abstracts annotated with entities and relations, along with its annotation guidelines. The annotation guidelines were refined by iterative rounds of overlapping annotations, in which inter-annotator agreement was leveraged to improve the guidelines. To demonstrate PICKLE’s utility, we evaluated the performance of pretrained models from other domains and trained a new, PICKLE-based model for entity and relation extraction (RE). The PICKLE-trained models exhibit the second-highest in-domain entity performance of all models evaluated, as well as a RE performance that is on par with other models. Additionally, we found that computer science-domain models outperformed models trained on a biomedical corpus (GENIA) in entity extraction, which was unexpected given the intuition that biomedical literature is more similar to PICKLE than computer science. Upon further exploration, we established that the inclusion of new types on which the models were not trained substantially impacts performance. The PICKLE corpus is, therefore, an important contribution to training resources for entity and RE in the plant sciences.

     
    more » « less
  5. The plant science corpus consists of the titles and abstracts of plant science articles in PubMed published prior to 2021 with a small number of 2021 records due to modification of records. The columns are: Index: integer index serving as identifier PMID: PubMed identifier Date: Publication date Journal: journal where the article was published Title: Title of the article Abstract: Abstract of the article Corpus: Title and abstract combined Text classification score: plant science record prediction model score Preprocessed corpus: Corpus after lower-casing, stop word removal, removal of non-alphanumeric and non-white space characters, lemmitisation Topic: index of topics after topic modeling 
    more » « less
  6. Switchgrass low-land ecotypes have significantly higher biomass but lower cold tolerance compared to up-land ecotypes. Understanding the molecular mechanisms underlying cold response, including the ones at transcriptional level, can contribute to improving tolerance of high-yield switchgrass under chilling and freezing environmental conditions. Here, by analyzing an existing switchgrass transcriptome dataset, the temporal cis- regulatory basis of switchgrass transcriptional response to cold is dissected computationally. We found that the number of cold-responsive genes and enriched Gene Ontology terms increased as duration of cold treatment increased from 30 min to 24 hours, suggesting an amplified response/cascading effect in cold-responsive gene expression. To identify genomic sequences likely important for regulating cold response, machine learning models predictive of cold response were established using k -mer sequences enriched in the genic and flanking regions of cold-responsive genes but not non-responsive genes. These k -mers, referred to as putative cis -regulatory elements (pCREs) are likely regulatory sequences of cold response in switchgrass. There are in total 655 pCREs where 54 are important in all cold treatment time points. Consistent with this, eight of 35 known cold-responsive CREs were similar to top-ranked pCREs in the models and only these eight were important for predicting temporal cold response. More importantly, most of the top-ranked pCREs were novel sequences in cold regulation. Our findings suggest additional sequence elements important for cold-responsive regulation previously not known that warrant further studies. 
    more » « less
  7. Abstract New graduate students in biology programs may lack the quantitative skills necessary for their research and professional careers. The acquisition of these skills may be impeded by teaching and mentoring experiences that decrease rather than increase students’ beliefs in their ability to learn and apply quantitative approaches. In this opinion piece, we argue that revising instructional experiences to ensure that both student confidence and quantitative skills are enhanced may improve both educational outcomes and professional success. A few studies suggest that explicitly addressing productive failure in an instructional setting and ensuring effective mentoring may be the most effective routes to simultaneously increasing both quantitative self-efficacy and quantitative skills. However, there is little work that specifically addresses graduate student needs, and more research is required to reach evidence-backed conclusions. 
    more » « less
    Free, publicly-accessible full text available April 29, 2024
  8. null (Ed.)
    Abstract Background Availability of plant genome sequences has led to significant advances. However, with few exceptions, the great majority of existing genome assemblies are derived from short read sequencing technologies with highly uneven read coverages indicative of sequencing and assembly issues that could significantly impact any downstream analysis of plant genomes. In tomato for example, 0.6% (5.1 Mb) and 9.7% (79.6 Mb) of short-read based assembly had significantly higher and lower coverage compared to background, respectively. Results To understand what the causes may be for such uneven coverage, we first established machine learning models capable of predicting genomic regions with variable coverages and found that high coverage regions tend to have higher simple sequence repeat and tandem gene densities compared to background regions. To determine if the high coverage regions were misassembled, we examined a recently available tomato long-read based assembly and found that 27.8% (1.41 Mb) of high coverage regions were potentially misassembled of duplicate sequences, compared to 1.4% in background regions. In addition, using a predictive model that can distinguish correctly and incorrectly assembled high coverage regions, we found that misassembled, high coverage regions tend to be flanked by simple sequence repeats, pseudogenes, and transposon elements. Conclusions Our study provides insights on the causes of variable coverage regions and a quantitative assessment of factors contributing to plant genome misassembly when using short reads and the generality of these causes and factors should be tested further in other species. 
    more » « less